Ensemble Clustering Example

This is a simple example of ensemble clustering using Python and the scikit-learn library.

Ensemble Clustering Overview

Ensemble clustering involves combining the results of multiple clustering algorithms or multiple runs of the same algorithm to improve the overall clustering performance. The goal is to enhance the robustness and reliability of the clustering solution by leveraging the diversity of multiple algorithms or runs.

Key concepts of ensemble clustering:

Base Clustering Algorithms: Individual clustering algorithms used as base models.
Combining Mechanism: Method for combining the results of base clustering algorithms (e.g., voting, averaging).
Diversity: Ensuring diversity among base clustering algorithms to capture different aspects of the data.

Python Source Code:

# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans, AgglomerativeClustering
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import adjusted_rand_score

# Generate synthetic data with three clusters
X, y = make_blobs(n_samples=300, centers=3, random_state=42)

# Define base clustering algorithms
kmeans = KMeans(n_clusters=3, random_state=42)
agg_clustering = AgglomerativeClustering(n_clusters=3)

# Create an ensemble of clustering algorithms
ensemble_clustering = VotingClassifier(estimators=[
    ('kmeans', kmeans),
    ('agg_clustering', agg_clustering)
], voting='hard')

# Fit the ensemble clustering model
ensemble_clustering.fit(X)

# Get the ensemble clustering labels
ensemble_labels = ensemble_clustering.predict(X)

# Evaluate the performance using Adjusted Rand Index (ARI)
ari_score = adjusted_rand_score(y, ensemble_labels)
print(f'Adjusted Rand Index (ARI) of Ensemble Clustering: {ari_score:.2f}')

# Plot the original and ensemble clustering results
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', edgecolors='k')
plt.title('Original Clustering')

plt.subplot(1, 2, 2)
plt.scatter(X[:, 0], X[:, 1], c=ensemble_labels, cmap='viridis', edgecolors='k')
plt.title('Ensemble Clustering')

plt.show()

Explanation:

Import Libraries: Import necessary Python libraries, including scikit-learn for clustering.
Generate Synthetic Data: Generate synthetic data with three clusters for demonstration purposes.
Define Base Clustering Algorithms: Define two base clustering algorithms (KMeans and Agglomerative Clustering).
Create Ensemble Clustering: Create an ensemble of clustering algorithms using the VotingClassifier.
Fit and Predict: Fit the ensemble clustering model on the data and predict cluster labels.
Evaluate Performance: Evaluate the performance of the ensemble clustering using Adjusted Rand Index (ARI).
Plot Results: Plot the original clustering and the ensemble clustering results for comparison.